# Teacher Testing Standards and the New Teacher Pipeline

## Authors
Marc Law, Tim Marks, and Tomer Stern

## Journal
*Journal of Human Resources* (JHR) -- Accepted

---

## Overview

This replication package contains all code and data necessary to reproduce the tables and figures in "Teacher Testing Standards and the New Teacher Pipeline."

### Research Question
This paper examines how changes in teacher licensure testing standards affect the supply of new teachers. We exploit the transition from the Praxis Pre-Professional Skills Test (PPST) to the more difficult Praxis Core in 2013-2014 as a source of exogenous variation in testing standards across states.

### Key Findings
Increases in the Test Difficulty Index (TDI) -- a measure of how stringent state testing requirements are -- lead to significant reductions in both enrollments and graduations from teacher preparation programs:

| Outcome | TDI Effect | SE | N |
|---------|-----------|-----|------|
| Education Major Enrollments | -0.16* | (0.08) | 2,896 |
| Teacher Prep Graduates | -0.22* | (0.10) | 5,748 |

A 1 standard deviation increase in the Test Difficulty Index leads to a 16% decrease in education major enrollments and a 22% decrease in teacher preparation program graduates.

### Replication Results
The R replication matches the original Stata results closely: 136 of 137 regression results match the paper within rounding tolerance. The one discrepancy (Table 6, Panel B, Binding Test) has been fully explained as a carryover from an earlier draft that used population weights. See the "Known Discrepancies" section below for details.

### Dual Implementation
The full analysis is implemented in both **R** and **Stata**. The R scripts (`.R`) and Stata do-files (`.do`) share the same numbering scheme and produce equivalent output. Either can be used independently to replicate all tables and figures.

---

## Data Sources

### Primary Data

1. **ETS (Educational Testing Service)**
   - PPST passing scores by state (pre-2013)
   - Praxis Core passing scores by state (2013+)
   - National mean/SD for z-score standardization
   - Used to construct the Test Difficulty Index (TDI = average of math, reading, writing z-scores)

2. **IPEDS (Integrated Postsecondary Education Data System)**
   - Fall enrollment by CIP code 13 (education majors)
   - Completions by 6-digit CIP code (teacher preparation programs)
   - Institution characteristics (selectivity, admissions, financial aid)
   - URL: https://nces.ed.gov/ipeds/
   - Years: 2008-2020

3. **Title II Higher Education Act Reports**
   - Teacher preparation program enrollment and completions
   - Traditional vs. alternative programs
   - URL: https://title2.ed.gov/
   - Years: 2011-2020

4. **State License Data**
   - New teacher licenses issued by state (Kraft and Lyon 2024)

5. **Teacher Shortage Data**
   - Designated teacher shortage areas by state and subject
   - Source: tsa.ed.gov

### Control Variables
- Unemployment rate, median income (BLS/Census)
- School-age population (Census)
- Teacher accountability reforms (Kraft et al. 2020)
- edTPA requirements (Chung and Zou 2023)

---

## File Structure

```
Clean/
├── README.md                              # This file
├── code/
│   ├── 00_master.R                        # Master script -- runs all 7 R steps in order
│   ├── 00_master.do                       # Master script -- runs all Stata do-files
│   ├── 01_clean_ets_data.R / .do         # Clean ETS/Praxis data, construct TDI
│   ├── 02_clean_ipeds_data.R             # Download and clean IPEDS data (skip if exists)
│   ├── 03_create_descriptive_tables.R / .do  # Generate Tables 1, 2, and 3
│   ├── 04_merge_event_data.R / .do       # Merge IPEDS + ETS into event study panels
│   ├── 05_secondary_regressions.R / .do  # Placebo, license, and shortage event studies
│   ├── 06_main_regressions.R / .do       # Tables 4-7, A1, A2 (48 regressions)
│   └── 07_figures.R / .do                # All 12 publication figures (B&W + color)
├── data/
│   ├── raw/
│   │   ├── ets/                          # Raw ETS passing scores (.dta, .xlsx)
│   │   ├── composite_treatment/          # Subject-specific TDI treatment files
│   │   ├── licenses/                     # State license issuance data
│   │   ├── placebo/                      # Placebo test data (non-ed completions, enrollments)
│   │   ├── policy/                       # Education policy data
│   │   ├── shortages/                    # Teacher shortage area data
│   │   ├── title_ii/                     # Title II HEA reports
│   │   └── title_ii_results/             # Title II Stata regression output
│   └── cleaned/                          # Processed analysis-ready datasets
│       ├── ets_treatment_data.xlsx       # TDI + event study interaction variables
│       ├── enrollment_event_data.xlsx    # IPEDS enrollment panel (university-year)
│       ├── graduation_event_data.xlsx    # IPEDS graduation panel (university-year)
│       ├── ipeds_data_cleaned.xlsx       # Full IPEDS merge (all topics)
│       └── titleII_final_data.xlsx       # Title II program-year panel
├── output/
│   ├── tables/                           # Regression tables + event study CSVs
│   │   ├── table_{1-7,A1,A2}_*.xlsx     # 9 main regression/descriptive tables
│   │   └── *.csv                         # 8 event study coefficient files
│   └── figures/                          # 12 figures × (B&W + color) × (PDF + PNG) = 48 files
└── docs/
    └── FINAL_JHR_ACCEPTED.pdf            # Accepted paper (for reference)
```

---

## Code Files (Detailed)

All analysis scripts exist in both R (`.R`) and Stata (`.do`) versions, except `02_clean_ipeds_data.R` which has no Stata equivalent (it downloads IPEDS data via API; Stata users should use the pre-cleaned data). The R and Stata implementations are independent and produce equivalent output.

### `00_master.R` / `00_master.do` -- Master Script
- Installs and loads all required packages
- Sources all 7 analysis scripts (01-07) in correct order
- Reports success/failure status for each step
- Total runtime: ~45 minutes with bootstrap; ~2 minutes without

### `01_clean_ets_data.R` -- ETS Data Cleaning
- **Inputs**: `data/raw/ets/cleaned ppst and core and composite.dta`, `data/raw/ets/states_we_add_back_in.xlsx`
- **Output**: `data/cleaned/ets_treatment_data.xlsx`
- **What it does**:
  - Loads raw PPST and Praxis Core passing scores
  - Adds back ND, OR, TN from supplementary file
  - Fixes PA math cutoff changes (150 -> 142 in 2017)
  - Fills forward AR, CT, DE values for 2018/2020
  - Expands even-year data to odd years (2009, 2011, 2013, 2015, 2017, 2019)
  - Computes z-scores: `(passing_score - test_mean) / test_sd`
  - Computes TDI = average of math, reading, writing z-scores
  - Computes composite TDI using `passingscore_composite` (for composite-3 loophole states)
  - Creates DeltaTDI = TDI(2014) - TDI(2012) and event study interaction variables (`lead1`...`lag7`)
- **Runtime**: ~5 seconds

### `02_clean_ipeds_data.R` -- IPEDS Data
- **Output**: `data/cleaned/ipeds_data_cleaned.xlsx`
- **What it does**:
  - Skips if cleaned file already exists (for fast replication)
  - Downloads from IPEDS API via `educationdata` package (~30 min)
  - Processes: completions (CIP-2 education, CIP-6 teacher prep), institutional characteristics, admissions, financial aid, directory info
  - Merges all IPEDS topics into single university-year panel
- **Runtime**: ~30 minutes (API download) or 0 seconds (if pre-cleaned data exists)

### `03_create_descriptive_tables.R` -- Descriptive Tables
- **Outputs**: `table_1_state_info.xlsx`, `table_2_summary_statistics.xlsx`, `table_3_ppst_zscores.xlsx`
- **Table 1**: State-by-state info (30 states: PPST usage, Core adoption date, pass scores, sample membership, shrinking/growing)
- **Table 2**: Summary statistics -- Panel A (outcome means/SDs for All/More Selective/Less Selective) + Panel B (control variable means/SDs). Selectivity split uses 2010 median with strict `>` threshold, matching the paper and regression code.
- **Table 3**: Pre-transition PPST z-scores by state and subject, TDI, Delta TDI. Composite states adjusted (-3 points)
- **Runtime**: ~10 seconds

### `04_merge_event_data.R` / `04_merge_event_data.do` -- Merge Event Study Panels
- **Inputs**: `data/cleaned/ets_treatment_data.xlsx`, `data/cleaned/ipeds_data_cleaned.xlsx`, `data/raw/policy/`
- **Outputs**: `data/cleaned/enrollment_event_data.xlsx`, `data/cleaned/graduation_event_data.xlsx`
- **What it does**:
  - Merges cleaned IPEDS data with ETS treatment variables
  - Adds state-level controls (economic, policy)
  - Creates enrollment panel (biennial, university-year) and graduation panel (annual, university-year)
  - Bridges the gap between IPEDS cleaning (02) and the descriptive/regression scripts (03+)
- **Runtime**: ~10 seconds

### `05_secondary_regressions.R` -- Secondary Event Studies
- **Outputs**: 6 CSV files for Figures 5A-C, 6, 7
- **Regressions**:
  - Figure 5A: Non-education enrollment placebo (biennial, university+year FE)
  - Figure 5B: Non-education completion placebo (annual, university+year FE)
  - Figure 5C: Other education completion placebo (annual, university+year FE)
  - Figure 6: State license event study (annual 2010-2019, state+year FE)
  - Figure 7: Teacher shortage event study (annual 2009-2020, state+year FE)
- **Data sources**: Merges graduation/enrollment event data with `placebo_data.dta`, `stateyrlicensetradalt.dta`, `total_shortages_state_year.dta`
- **Bootstrap**: Manual pairs cluster bootstrap (B=1000, seed=12345)
- **Runtime**: ~10 minutes (with bootstrap)

### `06_main_regressions.R` -- Main Regression Tables (1,186 lines)
- **Outputs**: 6 table .xlsx files + 2 event study CSVs
- **48 regression specifications**:
  - Table 4: 8 TWFE enrollment regressions (total, selectivity, race, shrinking/growing)
  - Table 5: 8 TWFE graduation regressions (same subsamples, with 1-year TDI lag)
  - Table 6: 10 alternative TDI regressions (5 subjects x 2 panels)
  - Table 7: 12 Title II regressions (6 subsamples x 2 panels)
  - Table A1: 10 robustness regressions (5 variations x 2 panels)
  - Table A2: Enrollment + graduation event studies (5 + 11 year coefficients)
- **Helper functions**: `run_twfe()` (TWFE with bootstrap), `run_event_study()` (event study with bootstrap), `prepare_enrollment_data()`, `prepare_graduation_data()`, `build_state_lag()`
- **Bootstrap**: Manual pairs cluster bootstrap (B=1000, seed=12345)
- **Runtime**: ~30 minutes (with bootstrap); ~30 seconds (without)

### `07_figures.R` -- Publication Figures
- **Output**: 12 figures, each as PDF and PNG in both B&W and color variants (48 files total). All saved to `output/figures/`; color versions have `_color` suffix.
- **Style**: B&W uses `theme_jhr()` custom theme (minimal, white backgrounds); color versions use the same layout with color palettes
- **Data handling**: Auto-detects R-generated CSVs vs Stata XLSX/XLS output; prefers R when available
- **Figures**:
  - Fig 1: Line plot of TDI over time by state (with Praxis Core transition shading)
  - Figs 2A/2B: Scatter plots of Delta TDI vs % change in enrollments/graduations (with state labels, OLS fit line)
  - Figs 3/4: Event study coefficient plots (points + 95% CI error bars, treatment year dashed line)
  - Figs 5A-C: Placebo event studies (same format as Figs 3/4)
  - Fig 6: License event study (same format, post-treatment shading)
  - Fig 7: Teacher shortage event study (same format, post-treatment shading)
  - Fig A2: Title II graduations event study (from Stata output)
  - Fig A3: Scatter of Delta TDI vs % change in licenses
- **Runtime**: ~30 seconds

---

## Requirements

### Software
- **R** (version 4.0+; tested with R 4.4.1) -- full replication via R scripts
- **Stata** (version 16+; tested with Stata 17) -- full replication via Stata do-files (requires `reghdfe`, `ftools`, `outreg2`, `coefplot`, `estout`)
- Either R or Stata can replicate all tables and figures independently
- **Runtime**: ~45 minutes with bootstrap SEs; ~2 minutes without (see Fast Iteration below)

### Required R Packages

```r
# Data manipulation
install.packages(c("tidyverse", "readxl", "writexl", "haven", "data.table"))

# Fixed effects regressions
install.packages("fixest")

# Tables and output
install.packages(c("modelsummary", "kableExtra", "stargazer", "flextable", "officer"))

# Visualization
install.packages(c("ggplot2", "ggrepel", "patchwork", "viridis", "scales"))

# Other utilities
install.packages(c("glue", "purrr", "broom", "xml2"))

# Optional: For downloading IPEDS data directly from API
install.packages("educationdata")
```

---

## Replication Instructions

All data and code are self-contained within this directory. No external data directories are needed -- all raw data files (including placebo, license, shortage, composite treatment, and Title II data) are bundled under `data/raw/`. All paths in the code are relative to the `Clean/` root.

### Quick Replication (pre-cleaned data, ~45 min)

```r
setwd("/path/to/Clean/")
source("code/00_master.R")
```

This will:
1. Clean ETS data and create TDI (~5 sec)
2. Skip IPEDS download (pre-cleaned data exists)
3. Merge event study panels (~10 sec)
4. Generate Tables 1-3 (~10 sec)
5. Run secondary event studies with bootstrap (~10 min)
6. Run main regressions with bootstrap (~30 min)
7. Generate all 12 figures in B&W and color (~30 sec)

### Fast Iteration (no bootstrap, ~2 min)

Edit `code/05_secondary_regressions.R` and `code/06_main_regressions.R`:
```r
USE_BOOTSTRAP <- FALSE  # Change from TRUE to FALSE
```

Then run `source("code/00_master.R")`. Coefficients will be identical; SEs will use standard clustering (systematically smaller than paper's bootstrap SEs).

### Full Replication (from raw IPEDS data, ~75 min)

Delete `data/cleaned/ipeds_data_cleaned.xlsx`, then run `00_master.R`. The IPEDS API download takes ~30 minutes.

### Stata Replication

```stata
cd "/path/to/Clean/"
do code/00_master.do
```

The Stata master script installs required packages (`reghdfe`, `ftools`, `outreg2`, `coefplot`, `estout`), then runs all do-files in order (01, 04, 03, 05, 06, 07). Requires Stata 15+ and pre-cleaned data in `data/cleaned/`. Note: there is no Stata equivalent of `02_clean_ipeds_data.R` (IPEDS download); the cleaned data files must already exist.

---

## Output Inventory

### Regression Tables (`.xlsx`)

| File | Paper Table | Content |
|------|------------|---------|
| `table_1_state_info.xlsx` | Table 1 | State descriptive information (30 states) |
| `table_2_summary_statistics.xlsx` | Table 2 | Summary statistics (Panel A: outcomes, Panel B: controls) |
| `table_3_ppst_zscores.xlsx` | Table 3 | PPST z-scores and Delta TDI by state |
| `table_4_enrollments.xlsx` | Table 4 | Education enrollment regressions (8 columns) |
| `table_5_graduations.xlsx` | Table 5 | Teacher prep graduation regressions (8 columns) |
| `table_6_alt_tdi.xlsx` | Table 6 | Alternative TDI regressions (5 subjects x 2 panels) |
| `table_7_title_II.xlsx` | Table 7 | Title II program-level regressions (6 samples x 2 panels) |
| `table_A1_robustness.xlsx` | Table A1 | Robustness: alternative samples (5 samples x 2 panels) |
| `table_A2_event_study.xlsx` | Table A2 | Event study coefficients (enrollment + graduation) |

### Event Study CSVs

| File | Content | Coefficients |
|------|---------|-------------|
| `composite_enrollments_event_study_total.csv` | Enrollment event study (Table A2 Panel A, Figure 3) | 5 biennial |
| `composite_graduations_event_study_total.csv` | Graduation event study (Table A2 Panel B, Figure 4) | 11 annual |
| `enrollments_placebo_event_study_total.csv` | Non-ed enrollment placebo (Figure 5A) | 5 biennial |
| `placebo_non_ed_completions.csv` | Non-ed completions placebo (Figure 5B) | 11 annual |
| `placebo_other_ed_completions.csv` | Other ed completions placebo (Figure 5C) | 11 annual |
| `placebo_event_study_total.csv` | Combined placebo (5B + 5C) | 22 annual |
| `state_licenses_event_study_coefficients.csv` | License event study (Figure 6) | 9 annual |
| `teacher_shortage_event_study.csv` | Teacher shortage event study (Figure 7) | 11 annual |

### Figures (PDF + PNG, B&W + color)

| File | Paper Figure | Description |
|------|-------------|-------------|
| `figure_1_tdi_over_time` | Figure 1 | Line plot of TDI over time by state, Praxis Core transition shading |
| `figure_2a_scatter_enrollments` | Figure 2A | Scatter: Delta TDI vs % change in enrollments, state labels, OLS line |
| `figure_2b_scatter_graduations` | Figure 2B | Scatter: Delta TDI vs % change in graduations, state labels, OLS line |
| `figure_3_enrollments_event_study` | Figure 3 | Enrollment event study: coefficients + 95% CI, biennial |
| `figure_4_graduations_event_study` | Figure 4 | Graduation event study: coefficients + 95% CI, annual |
| `figure_5a_placebo_enrollments` | Figure 5A | Placebo: non-education enrollment event study, biennial |
| `figure_5b_placebo_graduations` | Figure 5B | Placebo: non-education completions event study, annual |
| `figure_5c_other_education_graduations` | Figure 5C | Placebo: other education completions event study, annual |
| `figure_6_licenses_event_study` | Figure 6 | State license event study, annual, post-treatment shading |
| `figure_7_shortages_event_study` | Figure 7 | Teacher shortage event study, annual, post-treatment shading |
| `figure_a2_titleII_graduations` | Figure A2 | Title II graduations event study (from Stata output) |
| `figure_a3_scatter_licenses` | Figure A3 | Scatter: Delta TDI vs % change in licenses, state labels, OLS line |

Note: Figure A1 is an ETS document reproduction, not a generated figure.

---

## Replication Quality Summary

| Table/Figure | Specs | Match Quality |
|-------------|-------|---------------|
| Tables 1-3 | 3 | Complete |
| Table 4 (Enrollments) | 8 | 1 exact, 7 within 0.02 |
| Table 5 (Graduations) | 8 | 5 exact, 2 close, 1 wider |
| Table 6 (Alt TDI) | 10 | 4 exact, 5 close, 1 explained discrepancy |
| Table 7 (Title II) | 12 | All 12 exact (round to paper values) |
| Table A1 (Robustness) | 10 | 4 exact, 6 within 0.02 |
| Table A2 (Event Study) | 16 | Pattern matches perfectly; grad N exact |
| Figures 1-7, A2-A3 | 12 | All publication quality |
| **Total** | **137** | **136 match (99.3%)** |

---

## Known Discrepancies

1. **Table 6, Panel B, Binding Test**: R produces -0.198; the paper reports -0.29. **Root cause**: The paper's -0.29 was generated by an older Stata script that included population weights. The final JHR Stata code (without weights) produces -0.20, matching the R result. All other 136 results match within rounding tolerance.

2. **Observation counts**: The R enrollment TWFE sample has 2,882 observations (paper: 2,896; difference: 14). The graduation TWFE sample has 5,736 (paper: 5,748; difference: 12). These gaps (<0.5% of the sample) arise from differences in how the pre-cleaned Excel data handles a small number of state-year observations for AR, DE, and CT. Coefficients are substantively unaffected.

---

## Contact

For questions about this replication package, please contact the authors.

---

## Citation

```bibtex
@article{LawMarksStern2024,
  title={Teacher Testing Standards and the New Teacher Pipeline},
  author={Law, Marc and Marks, Tim and Stern, Tomer},
  journal={Journal of Human Resources},
  year={2024}
}
```
